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INTRODUCTION 


Endogenous  retroviral  (ERV)  like  sequences  can  represent  agents  of  genetic  variation 
within  an  organism  through  retrotransposition.  Transcriptional  activation  of  ERVs  may  lead  to 
reintegration  of  the  provirus  into  the  host’s  genome  causing  mutations  ( 1 ).  These  alterations  may 
take  the  form  of  enhancer  mutations  affecting  genes  at  sites  distant  from  the  point  of  integration, 
promoter  mutations  where  genes  are  directly  activated  by  the  integration  or  disruption  mutations 
where  the  retroviral  sequences  become  incorporated  into  an  adjacent  gene.  The  disruption 
mutations  may  add  sequences  5’  through  use  of  the  LTR  promoter,  internally  via  splicing  or  3’  by 
donation  of  polyadenlyation  signals  (2).  All  of  these  chimeras  can  be  assayed  through  differential 
screening  with  probes  specific  for  elements  within  the  retroviral  element.  Hybridization  with 
probes  specific  for  the  gag,  pol  and  env  regions  will  identify  the  ERV  transcripts  contained  within 
the  given  cDNA  library.  Rehybridization  with  probes  specific  for  the  LTR  regions  will 
discriminate  between  full  length  retroviral  transcripts  (represented  by  clones  which  hybridized  to 
both  sets  of  probes)  and  those  which  are  potentially  chimeric  with  LTR  sequences  (hybridized  to 
only  the  LTR  probes).  This  method  of  differential  hybridization  has  been  used  successfully  by 
Mager  and  colleagues  to  identify  novel  genes  from  an  NTera2Dl  cDNA  library  (2,3). 

Previously  it  was  reported  that  several  DMBA  (7,12-dimethylbenz  (a)  anthracene)-induced 
mouse  mammary  carcinomas  over  expressed  the  endogenous  retrotransposon  intracistemal  A 
particles  (I  APs).  Indeed,  Asch  and  Asch  found  that  high  expression  of  lAP  RNA  and  protein  is 
present  in  many  mouse  mammary  tumors  and  preneoplasias,  whereas  little  or  no  expression  is 
detected  in  normal  mammary  glands  from  virgin,  pregnant,  lactating  or  involuting  mice  (4,5,6). 
Therefore  changes  in  lAP  expression  frequently  occur  during  the  progression  to  tumorigenesis  but 
not  during  normal  growth  and  differentiation  cycles  of  the  mammary  gland  (5,6).  For  this  reason  a 
cDNA  library  from  a  DMBA-induced  mouse  mammary  tumor  which  expressed  the  highest  levels 
of  lAP  was  constructed  and  screened  by  differential  hybridization.  Several  clones  were  isolated 
which  demonstrated  chimerism  between  the  lAP  molecule  and  a  unique  cellular  transcript  Many 
of  these  transcripts  were  found  to  be  neither  altered  in  expression  patterns  nor  transcript  size  from 
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normal  compared  to  tumor  tissues  upon  Northern  analysis.  One  clone  however,  showed  both 
altered  transcript  size  and  tumor  specific  expression.  The  clone,  plTb  later  renamed  kokopelli, 
was  chimeric  with  lAP  Sequences,  however  the  association  was  deemed  artifactual  due  to  the 
presence  of  two  polyadenylation  signals  and  tails  within  the  same  clone.  Sequence  analysis  of  the 
isolated  transcript  demonstrated  a  murine  B1  repetitive  element  in  association  with  part  of  the  U1 
snRNP  specific  protein  C  gene  separated  by  sequences  not  found  in  the  database.  Expression  of 
the  tumor  specific  transcript  was  documented  in  several  mouse  mammary  tumors  of  various 
etiologies  as  well  as  in  a  series  of  mouse  mammary  cell  lines  which  vary  in  their  ability  to  grow 
tumors  in  nude  mice.  The  transcript  which  did  not  show  tumor  specific  expression  was  expressed 
in  all  mouse  tissues  and  cell  lines  examined.  This  transcript  is  postulated  to  represent  the  normal 
counterpart  to  the  tumor  associated  transcript. 

Since  only  a  partial  clone  was  isolated  in  the  original  library  screen,  the  full  length  tumor 
and  normal  transcripts  were  sought.  Rescreening  of  the  DMBA-induced  tumor  libraiy  resulted  in 
transcripts  which  were  unrelated  to  kokopelli  or  in  isolation  of  the  full  length  U1  snRNP  specific 
protein  C  gene.  Rapid  amplification  of  cDNA  ends  was  attempted  but  yielded  no  new  transcript 
information  as  the  products  terminated  within  the  BI  repetitive  element  of  kokopelli.  The  efforts  to 
isolate  the  transcript  from  normal  tissues  and  to  characterize  the  3  ’  end  of  the  tumor  transcript  are 
the  subjects  of  this  report. 

EXPERIMENTAL  METHODS 

cDNA  Library  Screening.  The  kidney  (XZAP,  Stratagene),  brain  and  liver  (XgtlO,  Clontech) 

cDNA  libraries  used  for  screening  were  generous  gifts  from  Dr.  Deborah  Nagle  of  Millennium 
Pharmaceuticals.  The  kidney  cDNA  library  was  separated  into  10  pools  of  approximately  1  x  10^ 
clones  per  pool.  These  pools  were  screened  with  PCR  primers  pl7bGSP4  and  pl7bGSP6  (Fig.  1) 
and  the  positive  pools  were  subdivided  into  pools  of  approximately  1x10^  clones  per  pool.  These 
pools  were  screened  by  PCR  as  described  above,  the  positive  pools  were  plated  onto  NZCYM  by 
standard  protocols  (7).  The  phage  were  transferred  to  Nylon  membranes  (MSI,  Westboro  MA), 
the  filters  were  divided  into  4  quadrants  and  the  phage  eluted  off  the  membrane  in  SM  media.  The 
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eluted  phage  were  screened  by  PCR  as  described  above.  The  resulting  positive  pools  were  then 
plated  and  screened  via  hybridization  using  an  amplicon  from  primers  pl7bGSP4  and  pl7GSP6 
by  standard  methods.  The  brain  and  liver  cDNA  libraries  were  screened  by  hybridization  using  a 
660bp  Haelll  probe  derived  from  3’  end  of  clone  pK823  (see  below). 

cDNA  Cloning  and  Sequencing.  All  clones  derived  from  the  kidney  cDNA  library  were 
excised  from  the  phagemid  as  described  by  the  manufacturer  (Stratagene).  The  lambda  clones 
isolated  from  the  mouse  liver  and  brain  specific  libraries  were  plaque  purified  and  the  DNA  isolated 

via  liquid  lysate  (8).  Inserts  were  liberated  from  the  XgtlO  vectors  by  EcoRI  digestion.  The  inserts 

were  purified  from  a  1%  agarose  gel  using  Geneclean  (Bio  101)  and  ligated  into  the  EcoRI  site  of 
pGEM7Z(+).  Sequence  analysis  was  performed  in  Roswell  Park’s  core  facility  using  the  ABI 
automated  DNA  sequencer.  The  M13  universal  forward  and  reverse  primers  were  used  in  all 
sequencing  reactions  along  with  gene  specific  primers  where  indicated.  All  sequences  were 
analyzed  using  either  the  PASTA  and  BESTFIT  algorithms  of  the  GCG  program  maintained  by  the 
University  of  Wisconsin  at  Madison  or  by  using  the  BLAST  algorithm  at  the  NCBI. 

Rapid  Ampliflcation  of  cDNA  Ends  (RACE).  The  Marathon-Ready  kidney  cDNA 
(Clontech)  was  used  to  generate  5’  extensions  of  clone  pK823  (see  Results).  Primers  used  for 
5’RACE  were:  K823-1  (S’CCCCAGGTGGAGATTTGTCTACS’),  K823-2 

(5’CACGCTGTATGATCTCCCGGAG3’)  494  bp  and  296  bp  from  the  5’  end  of  pK823, 
respectively  and  K823-3  (5’  CCCACTATAAATATACAGCTCCATGGGCTTC3’),  K823-4  (5’ 
TCCCCTGTCTGTTCACCCAGG3’),  275  bp  and  140  bp  from  the  5’  end  of  clone  pK823-2, 
respectively.  A  hot  start  reaction  was  utilized  in  every  case.  Two  rounds  of  PCR  were  done  for 
each  5’  extension.  The  5’  extension  of  clone  pK823  was  conducted  with  primer  K823-1 
consisting  of  35  cycles  of  denaturation  at  94°C  for  45  sec,  annealing  at  62'’C  for  45  sec  and 
extension  at  72”C  for  3  min  followed  by  a  nested  reaction  with  primer  K823-2  using  the  same  PCR 
conditions.  The  initial  5’  extension  of  clone  pK823-2  was  done  with  primer  K823-3  using  the 
following  conditions:  94°C  for  30  sec,  70°C  for  5  min  for  10  cycles  then  94‘’C  for  30  sec,  6TC  for 
5  min  for  an  additional  25  cycles.  The  high  annealing  temperature  of  the  first  10  cycles  allows  for 
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gene  specific  primer  binding  but  not  for  the  upstream  anchor  primer.  A  nested  round  of  PCR  was 
done  with  primer  K823-4  using  94°C  for  45  sec,  65°C  for  30  sec  and  72°C  for  3  min  for  30 
cycles.  All  RACE  PCR  products  were  excised  from  a  1.5%  agarose  gel,  genecleaned  and  cloned 
into  pGEM  T/A  Easy  (Promega).  To  facilitate  sequence  analysis  of  the  K823  RACE  products 
several  subclones  were  generated  by  digestion  with  various  restriction  enzymes  followed  by 
religation  of  the  vector. 

For  3’  RACE  total  RNA  from  a  D2  tumor  and  normal  mammary  gland  from  a  virgin  animal 
was  extracted  from  fresh  frozen  tissue  using  the  TRI  reagent  (Molecular  Research  Center,  Inc.). 
The  RNA  was  poly  (A)^  selected  using  oligo  dT  column  chromatography  as  described  by  the 
supplier  (Molecular Research  Center,  Inc.).  One  to  3  micrograms  of  poly  (A)"^  RNA  was  reverse 
transcribed  with  Superscript  II  RT  (Gibco,  BRL)  at  45°C  using  the  3’  RACE  oligo 
(5’GGCCTAGGCCTTAAGGGCCCTAC(T),23’)  as  described  by  the  manufacturer.  The  initial 
PCR  was  done  using  pl7bGSP4  as  the  upstream  gene  specific  primer  in  conjunctions  with  the  3’ 
RACE  primer  (5’GGCCTAGGCCTTAAGGGCCCTAC3’).  PCR  amplification  consisted  of  a  hot 
start  reaction,  95°C  for  5  min,  80°C  hold  where  buffer,  polymerase  and  MgQ2  were  added, 
followed  by  25  cycles  of  94°C  for  45  sec,  58°C  for  45  sec  and  72°C  for  3  min.  A  nested  round  of 
PCR  using  primer  pl7bGSP5R  as  the  upstream  gene  specific  primer  was  done  under  the  same 
conditions.  PCR  was  done  with  the  High  Fidelity  PCR  kit  from  Boehringer  Mannheim 
(Indianapolis,  IN).  Amplification  products  were  run  on  a  1.5%  agarose  gel  and  cloned  as 
described  above. 

Northern  Analysis.  Total  RNA  (15  pg)  or  poly  A+  (1-5  pg)  from  several  murine  tumors  and 

normal  tissues  were  electrophoresed  through  a  1.2%  2.2M  Formaldehyde  gel  and  transferred  to 
Nylon  membranes  by  standard  protocols.  Northern  blots  were  either  cut  into  strips  (stripblotS)  or 
remained  intactfor  hybridization.  Probes  for  hybridization  were  generated  by  random  priming  (9). 
Hybridizations  were  done  in  50mM  NaP04,  1%  BSA,  7%  SDS  and  ImM  EDTA at  65°C  for  16  to 
20  hours.  Blots  were  rinsed  in  2X  SSC  then  washed  stringently  in  0.2X  SSC  and  0.1%  SDS  at 
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58°C  for  1  hour.  Filters  were  subjected  to  autoradiography  at  -80°C  under  an  intensifying  screen 
from  2  days  to  2  weeks  depending  on  the  probe. 

Genomic  Localization  of  pK823.  A  C57BL/6  X  Mus  spretus  backcross  panel  from  the 
Jackson  Labs  (Bar  Harbor,  ME)  was  screened  via  hybridization  with  an  SOObp  Hae  III  fragment  of 
pK823.  This  probe  represents  the  extreme  3 ’end  of  the  know  sequence  and  does  not  contain 
any  repetitive  elements.  Hybridizations  were  done  on  the  backcross  panel  digested  with  Pst  I  (a 
generous  gift  from  Dr.  Rosemary  Elliott,  Roswell  Park  Cancer  Institute).  The  mapping  data  were 
analyzed  using  Map  Manager  v2.0  (Ken  Manly,  Roswell  Park  Cancer  Institute). 

RESULTS  and  DISCUSSION 

Isolation  of  pK823.  A  PCR  based  method  of  screening  a  XZAP  mouse  kidney  cDNA  library 

was  employed  to  isolate  the  normal  counterpart  to  the  tumor  specific  transcript  pl7b  (Kokopelli). 
Screening  of  library  pools  with  primers  specific  for  Kokopelli  followed  by  filter  hybridization  with 
a  probe  derived  from  the  primers  used  to  screen  the  pools,  produced  several  positive  clones.  One 
of  these  clones,  pK823,  contained  a  1.9kb  insert  which  hybridized  to  a  6.6  knt  transcript  in  mouse 
heart  and  a  DMBA-induced  tumor  along  with  the  1.4  knt  transcript  found  in  the  DMBA-induced 
tumor  on  a  Northern  strip  blot  (Fig.  2).  The  probe  did  not  hybridze  to  the  6.6  knt  transcript  from 
total  mouse  liver  RNA  but  a  faint  hybridization  signal  was  detected  to  the  1 .4  knt  transcript  (Fig.  2, 
center).  pK823  did  not  hybridize  to  the  1.4  knt  transcript  from  total  mouse  heart  RNA  (Fig.  2, 
center).  Upon  sequence  analysis  of  the  5’  end  of  pK823  a  murine  B1  element  homology  region 
was  discovered  (Figs.  2  and  3).  Figure  3  shows  673  bp  of  sequence  from  pK823  and  the  B1 
element  homology  region  is  undefined.  This  homology  region  covers  151  bp  of  pK823  and  is 
87%  identicle  to  the  mouse  B 1  repetitive  elementfound  in  Genbank  (Fig.  3B).  Sequences  3  ’  to  the 
B1  repetitive  element  had  no  homology  to  any  sequences  within  the  NCBI  database.  The  unique 
portions  of  pK823  had  no  sequence  homology  with  Kokopelli  outside  of  the  B1  element  (data  not 
shown).  To  determine  if  theBl  element  was  responsible  for  the  hybridization  signals  seen  on  the 
Northern  strip  blots,  pK823  was  digested  with  Hae  III  which  resulted  in  fragment  sizes  of 
approximately  660  bp,  405  bp,  340  bp,  107  bp,  42  bp,  40  bp,  and  13  bp.  The  660  bp  fragment 
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represented  the  extreme  3’  terminus  and  the  405  bp  fragment  represented  an  internal  fragment, 
neither  of  which  contained  the  B1  repetitive  element.  Both  of  these  probes  hybridized  to  the 
correct  transcripts  on  Northern  blots  (Fig.  2).  Both  the  5’  and  3’  ends  of  pK823  hybridize  strongly 
to  the  6.6  knt  transcript  from  kidney  however  the  probes  differ  in  hybridation  to  other  tissues  (Fig. 
2  right  and  left  panels).  Also,  the  probes  hybridize  to  the  1.4  knt  transcript,  previously  found 
expressed  only  in  tumors  (Fig  2  and  the  previous  report).  These  results  suggest  that  the  1.4  knt 
transcript  may  not  be  expressed  tumor  specifically  but  rather  upregulated  in  mouse  mammaiy 
tumors.  Upregulation  may  come  at  the  exprense  of  the  larger  6.6  knt  transcript  as  Figure  2  right 
panel  shows  increased  expression  of  the  1.4  knt  transcript  but  little  expression  of  the  6.6  knt 
transcsript  in  both  the  DMBA-induced  and  hormonally  (Dim3)  induced  tumors.  The  RNA  from  a 
mammary  gland  isolated  from  a  virgin  mouse  shows  little  hybridization  to  either  band  (Fig  2., 
right).  The  ethidium  bromide  staining  of  the  gel  prior  to  transfere  show  approximatly  equal  loads 
of  all  the  RNA  (data  not  shown).  In  addition,  the  results  also  suggest  that  the  B1  element  alone 
was  not  responsible  for  the  Northern  hybridization  pattern  seen  despite  the  apparent  lack  of 
sequence  similarity  between  pK823  and  Kokopelli  outside  the  B1  element.  Repeated  attempts  to 
hybridized  the  B 1  portion  of  the  clone  have  not  been  successful  (data  not  shown). 

Genomic  Localization  of  pK823.  Kokopelli  (plTb)  was  previously  mapped  to  the  proximal 
end  of  chromosome  17  (our  unpublished  results).  In  order  to  determine  if  pK823  was  part  of  die 
same  locus  and  thus  represents  an  alternative  product  of  that  locus,  the  Jackson  Labs  (C57BI/6  X 
Mus  spretus)  X  Mus  spretus  backcross  was  analyzed  using  the  660  bp  Hae  III  fragment  of  pK823. 
Backcross  analysis  using  Map  Manager  localizes  the  pK823  clone  to  mouse  chromosome  7  (Fig. 
4).  This  result  suggest  that  pK823  and  Kokopelli  are  not  the  same  gene  but  may  instead  be 
members  of  a  larger  gene  family. 

5’  RACE  of  pK823.  Although  pK823  and  Kokopelli  do  not  reside  on  the  same  chromosome 
and  therefore  are  not  the  same  gene.  Northern  analysis  suggests  they  recognize  similar  transcripts 
which  are  differentially  expressed  between  normal  tissues  and  tumors.  It  is  possible  that  the 
sequence  obtained  to  this  point  is  entirely  noncoding  and  the  two  genes  are  indeed  members  of  a 
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larger  family  of  which  the  B1  repetitive  element  is  common  at  the  3’  end  but  sequences 
downstream  are  unique.  In  other  words,  the  3’  terminal  exon  of  the  gene  family  is  marked  by  the 
presence  of  the  B1  repetitive  element  although  the  exons  themselves  are  not  similar.  Since 
substantial  sequence  5’  to  the  B1  element  had  not  been  isolated  from  clones  pK823,  pB311  or 
pLl  1-6  (see  below),  5’  RACE  of  the  pK823  clone  from  kidney  using  the  Marathon  Ready  cDNA 
kit  (Clontech)  was  attempted  (Fig.  5A).  Screening  the  cDNA  with  primers  K823-1  and  K823-2 
(see  Methods)  resulted  in  a  1.5  kb  extension  of  pK823.  Sequence  analysis  shows  100%  similarity 
from  K823-2  through  the  end  of  pK823  (Fig.  5B).  This  represents  approximately  300  bp  of 
sequence  overlap.  pK823-2  was  digested  with  EcoRJ  and  HincIII  to  isolate  a  probe  fragment  from 
which  the  B1  repetitive  element  had  been  removed  (Fig.  5A).  This  probe  hybridized  to  the  6,6  knt 
transcript  in  RNA  from  normal  tissues,  kidney  and  brain,  but  not  from  RNA  isolated  from  tumors 
(Fig.  5C).  This  probe  dected  the  1.4  knt  transcript  from  all  tissues  except  brain  (Fig.  5C). 
Therefore,  with  identical  sequence  overlap  between  the  two  clones  and  hybridization  to  the  same 
transcripts  on  Northern  blot  analysis  these  results  suggest  thatpK823-2  is  a  bom  fide  extension  of 
clone  pK823.  The  extension  has  been  completely  sequenced  in  both  directions  yet  there  is  no 
apparent  open  reading  frame  which  suggests  that  this  clone  may  still  be  within  the  3’  untranslated 
region  of  the  transcript. 

A  second  round  of  5’  RACE  with  primers  designed  from  the  5’  end  of  pK823-2  (K823-3 
and  K823-4,  see  Methods)  extended  the  pK823  sequence  another  1.6  kb  (Fig.  6A).  Sequence 
analysis  again  shows  100%  sequence  similarity  for  the  117  bp  of  overiap  between  the  3’  end  of 
K823-4and  the  5’  end  of  pK823-2  (Fig.  6B).  However  when  this  clone  (pK823-4)  was  used  as  a 
probe  on  Northern  blots  it  did  not  hybridize  to  the  same  transcripts  as  pK823-2  or  pK823.  In  fact 
pK823-4  hybridized  to  transcripts  of  approximately  1.8  knt  and  1.9  knt  in  the  tissues  examined 
(Fig.  6C).  This  clone  was  not  fully  sequenced.  Further  attempts  to  isolate  other  5’  extensions 
have  generated  clones  of  various  sizes  but  are  identical  to  pK823-4  by  restriction  digest  analysis 
(data  not  shown). 
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cDNA  Library  Screens.  In  an  attempt  to  isolate  longer  clones  from  normal  tissues,  mouse 
brain  and  liver  cDNA  libraries  were  screened  by  filter  hybridization  with  probes  corresponding  to 
either  kokopelli  or  pK823.  Two  positive  clones,  pB311  from  the  brain  cDNA  library  and  pLll-6 
from  the  liver  cDNA  library,  were  subjected  to  Northern  strip  blot  analysis  (Fig.  7A  and  data  not 
shown).  Both  of  these  clones  hybridized  to  the  6.6knt  RNA  transcript  from  normal  and  tumor 
tissues,  and  to  the  1.4  knt  transcript  found  only  in  the  tumors  (Fig.  7  and  data  not  shown).  Again 
the  probe  hybridized  weakly  to  the  1.4  knt  RNA  transcript  in  normal  tissues  (Fig.  Ik). 
Interestingly,  clone  pB311  hybridized  to  only  the  1.4knt  transcript  in  a  D2  tumor  (Fig.  7A). 
Hybridation  was  also  seen  to  smaller  RNA  transcripts  and  probably  represent  hybridization  of  the 
B 1  portion  of  the  clone  to  expressed  B 1  elements  or  to  the  7SL  RNA  from  which  B 1  elementt  are 
derived  (Fig.  Ik).  Complete  sequence  analysis  of  pB3 1 1  and  pLl  1-6  revealed  the  presence  of  a 
murine  B1  repetitive  element  in  each  clone  (Fig.  7B  and  C).  The  B1  repetitive  element  homology 
region  from  both  pB311  (Rg.  7B)  and  pLll-6  (Fig.  7C)  are  underlined.  Homology  to  the  B1 
repetitive  element  ranged  from  85%  over  164  bp  for  pB311  to  80%  over  133  bp  for  pLll-6. 
There  was  no  significant  sequence  homology  between  pB311  and  pLll-6  or  with  each  clone  and 
Kokopelli  and/or  pK823  (data  not  shown).  The  B1  repetitive  elements  from  all  four  clones  were 
only  80-85%  homologous.  pB3 1 1  and  pLl  1-6  were  not  analyzed  further. 

3*  RACE  for  Kokopelli.  In  the  previous  report  Kokopelli  was  found  to  be  chimeric  with  two 
other  genes,  namely  lAP  and  U1  snRNP  specific  protein  C.  The  association  with  the  lAP 
molecule  was  deemed  a  recombination  artifact  due  to  the  presence  of  two  poly  (A)^  tails  and  signals 
within  the  clone.  The  association  with  the  U1  snRNP  was,  however,  not  fully  investigated. 
Therefore,  in  order  to  isolate  the  true  3’  end  of  the  Kokopelli  gene  3’  RACE  was  undertaken  using 
primers  upstream  of  the  unique  region/snRNP  breakpoint  A  D2  tumor  which  expressed  high 
levels  of  the  1 .4  knt  transcript  yet  very  low  to  undetectable  levels  of  the  6.6  knt  transcript  ( see  Fig. 
7k)  was  subjected  to  reverse  transcrition  using  a  unique  oligo  dT  adapter  primer  (see  Methods). 
The  resulting  cDNA  was  then  subjected  to  two  rounds  of  nested  PCR  which  generated  an 
approximately  600  bp  amplicon.  Sequence  analysis  of  the  amplicon  revealed  the  exact  structure  as 
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that  found  from  the  original  clone  (Fig.  8A).  Comparison  of  the  the  two  clones  reveals  that  they 
were  not  100%  identical  (Fig.  8A).  These  results  suggest  that  the  tumor  associated  transcript  was 
indeed  chimeric  with  the  U1  snRNP  specific  protein  C.  Furthermore,  the  sequence  from  the  clone 
isolated  from  the  D2  tumor  predicts  an  inframe  association  which  alters  the  downstream  amino  acid 
sequence  and  hence  the  protein  of  U1  snRNP  specific  protein  C  (Fig  8B). 

To  determine  if  the  same  structural  arrangement  was  seen  in  normal  tissues,  3’  RACE  was 
conducted  on  poly  (A)*  selected  RNA  from  a  mammary  gland  from  a  virgin  mouse  exactly  as 
described  above.  The  amplicon  derived  from  the  normal  mammaiy  gland  was  of  similar  size  as 
that  isolated  from  tumors.  Sequence  analysis  demonstrated  the  same  structural  arrangement  as  that 
seen  previously  (Fig.  8A).  These  data  suggest  that  the  3’  end  of  Kokopelli  is  chimeric  with 
sequences  from  the  snRNP  specific  protein  C.  The  mouse  gene  for  the  U 1  snRNP  specific  protein 
C  has  recently  been  cloned  and  it  was  determined  that  there  are  many  copies  of  the  gene  in  the 
genome  (10).  The  sequence  identified  from  the  D2  tumor  and  mammary  gland  may  indeed 
represent  the  correct  sequence  of  the  transcript  as  they  were  isolated  using  a  high  fidelity 
polymerase  mixture  not  present  in  the  original  cloning  of  kokopelli.  Taken  together,  these  data 
would  suggest  that  the  association  of  the  unique  regions  of  Kokopelli  and  the  snRNP  described 
above  is  not  artifactual  but  rather  genuine  and  encodes  for  a  protein  with  a  unique  function. 
However,  the  protein  predicted  starts  from  the  nested  primer  used  in  the  3’  RACE  procedure  and  is 
rather  tenuous  as  the  entire  coding  region  of  kokopelli  has  not  been  isolated  from  any  tissue  to 
date.  Only  after  the  complete  transcript  has  been  isolated  can  predictions  be  made  on  possible 
protein  translation  and  function.  Attempts  to  isolate  the  5’  end  of  Kokopelli  continue. 
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RECOMMENDATIONS  ON  THE  S.O.W. 


For  the  current  period  of  the  grant  the  focus  was  on  isolation  of  the  normal  counterpart  to 
the  tumor  associated  transcript.  It  has  been  postulated  that  the  3’  end  of  kokopelli,  as  determined 
by  Northern  analysis,  is  common  between  the  normal  and  tumor  associated  transcripts  (the  6.6  knt 
and  1.4  knt,  respectively).  This  commonality  suggests  the  alteration  which  has  occurred  in  the 
tumor  transcript  was  5’  to  the  known  sequence.  Therefore,  by  isolation  of  the  full  length  normal 
transcript  probes  could  be  generated  along  the  6.6  knt  to  use  on  Northern  blots  to  determine  what 
sequences  were  contained  within  the  tumor  specific  transcript.  To  this  end  several  normal  cDNA 
libraries  were  screened  to  isolate  the  full  length  normal  transcript,  as  outlined  in  task  2  (months  12- 
36).  Three  different  transcripts  from  3  different  libraries  hybridized  to  the  same  transcripts  via 
Northern  analysis  as  kokopelli,  however  these  clones  shared  no  sequence  homology  outside  a 
common  murine  B1  repetitive  element.  Further  attempts  by  library  screening  and  PCR  techniques 
to  isolate  both  the  full  length  tumor  associated  transcript  and  the  full  length  normal  transcript  are 
currently  underway. 

Task  2  also  outlined  functional  assessment  of  the  cloned  gene.  This  task  is  difficult  to 
accomplish  until  the  full  length  transcripts  have  been  identified.  Sequence  analysis  of  the  3  ’  RACE 
products  from  the  D2  tumor  and  from  the  normal  mammary  tissues  demonstrated  an  open  reading 
frame.  Amino  acid  translation  was  carried  out  but  no  recognizable  functional  domains  were 
detected  (i.e.  transactivation  domains,  DNA  binding  domains,  transmemebrane  domains,  etc.). 
This  aneilysis  is  tenuous  due  to  the  incomplete  nature  of  the  clone. 

Task  3  of  Southern  analysis  of  normal  and  tumor  tissue  has  been  accomplished  however 
due  to  hybridization  problems  the  experiment  has  not  been  successful  on  the  full  panel  of  DMBA 
induced  tumors.  Also,  determination  of  the  clones  role  in  oncogenic  transformation  will  have  to 
wait  until  the  full  length  tumor  associated  transcript  is  isolated.  Isolation  of  the  genomic  locus  of 
the  gene  from  normal  and  tumor  tissue  may,  at  this  late  date,  be  unaccomplishable.  PAC  and  BAG 
mouse  libraries  are  available  for  screening  but  emphasis  should  be  placed  on  isolation  of  the  cDNA 
before  genomic  clones  are  obtained. 
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Aspects  of  task  4  have  been  accomplished,  screening  of  tumors  arising  from  various 
etiologies  and  the  DMBA-induced  series  of  tumors  for  the  occurrence  of  the  tumor  specific 
transcript.  However,  analysis  of  the  mutants  gene’s  role  in  development  of  DMBA-induced 
tumorigenesis  may  not  be  accomplished  nor  necessary.  Experiments  utilizing  the  TM  series  of 
mouse  tumor  cell  lines  has  indicated  that  expression  of  the  1.4  knt  maybe  a  very  early  event  in 
tumor  progression.  The  TM3L  cells  are  hyperplastic  and  do  not  usually  form  tumors  in  nude  mice 
yet  the  1.4  knt  transcript  was  expressed  in  those  cells.  In  addition,  the  non-neoplastic  cell  line 
NOG-8  also  demonstrated  expression  of  the  1.4  knt  transcript  suggesting  an  early  role  for  the  gene 
in  tumor  formation. 

CONCLUSIONS 

Isolation  of  a  novel  gene  fragment  from  a  DMBA-induced  mammaiy  carcinoma  has  led  to 
the  possible  identificatioil  of  a  large  gene  family  with  a  unique  genomic  organization.  Sequence 
analysis  of  transcripts  isolated  from  atleast  three  different  sources  of  RNA  (DMBA-induced  tumor, 
D2  tumor  and  normal  mammary  gland  from  a  virgin  mouse)  demonstrate  unique  sequences 
chimeric  and  in  frame  with  the  U 1  snRNP  specific  protein  C.  These  sequences  are  expressed  in  all 
normal  tissues  examined  as  a  6.6  knt  transcript  while  mammary  tumors,  neoplastic  and 
hyperplastic  cell  lines  express  a  unique  1.4  knt  transcript.  Clones  isolated  from  normal  mouse 
kidney  cDNA  library  and  the  DMBA-induced  tumor  cDNA  library  hybridized  to  the  6.6  knt 
transcript  on  Northern  analysis  yet  map  to  different  regions  of  the  genome  by  genetic  backcross 
emalysis  supporting  a  multi  gene  hypothesis.  However,  only  the  original  isolate,  kokopelli,  has  the 
unique  chimeric  structure  with  the  U1  snRNP  specific  protein  C.  All  potential  members  of  the 
gene  family  share  a  murine  B1  repetitive  element  in  their  putative  3’  untranslated  regions. 
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Figurel.  Schematic  representation  of  kokopelli.  The  mouse  B1  repetitive  element  is  shown  as  the 
dark  striped  box  and  is  256  bp  in  length.  The  U1  snRNP  specific  protein  C  is  shown  as  the 
darkly  stippled  box.  The  lighly  stipple  box  represents  the  PCR  amplicon  from  primer  GSP-4  and 
GSP-6  used  as  a  probe  on  Northern  blots  and  library  screens.  The  PCR  primers  used  3’  RACE 
are  shown  with  direction  or  orientation.  The  light  boxes  are  unique  sequences. 

Figure  2.  Characterization  of  pK823.  An  approximate  1.9  kb  clone  was  isolated  from  a  normal 
mouse  kidney  library  via  PCR  and  hybridizations  with  kokopelli  primers  and  probes,  respectively. 
A  partial  restriction  map  is  shown.  Not  all  the  Hae  III  sites  are  depicted  for  clarity.  A  1.7  kb  Eco 
RI  fragment  was  isolated  and  used  as  a  probe  on  Northern  strip  blots,  center  panel.  The  probe 
hybridized  to  an  approximate  6.6  knt  transcript  in  heart  and  tumor  but  not  liver  and  to  a  1.4  knt 
transript  in  the  tumor  lane  only,  center  panel.  Two  sub-fragments  of  the  clone,  405bp  Hae  III 
fragment  and  a  660  bp  Hae  III  fragment,  were  used  as  probes  on  Northern  strip  blots  to  detemune 
if  the  B1  repetitive  element  was  responsible  for  the  hybridization  pattern  seen  with  the  full  length 
probe.  Both  the  3’  probe  (660  bp  Hae  III)  and  the  5’  probe  (405  bp  Hae  III)  hybridized  to  the 
same  bands,  a  6.6  knt  in  all  tissues  and  a  1.4  knt  in  tumors,  left  and  right  panels.  The  18S  and 
28S  bands  are  marked.  The  B1  element  homology  region  is  shown  as  a  dark  striped  box.  The 
arrows  above  the  clone  are  the  direction  and  approximate  length  of  sequencing.  The  entire  clone 
has  been  sequenced  in  both  directions,  see  Fig.  5. 

Figure  3.  Sequence  comparison  between  pK823  and  the  mouse  B1  repetitive  element.  A.  The 
first  673  bp  of  pK823  are  shown  and  the  B 1  repetitive  element  homology  region  is  underlined.  B . 
Sequence  comparison  using  the  Bestfit  alogrithum  in  GCG  of  the  B 1  region  of  pK823  (top  strand) 
and  the  mouse  B1  repetitive  element  (bottom  strand).  There  is  87%  homology  between  the  B1 
elements  covering  151  bp  of  pK823  sequence.  The  mouse  B1  repetitive  element  can  be  found  in 
Genbank  under  accession  Gb_ro:Mmblr. 
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Figure  4.  Mapping  of  pK823.  The  (C57BL/6  X  Mus  spretus)  X  Mus  spretus  backcross  panel 
from  the  Jackson  labs  digested  with  Pst  I  was  a  gift  from  Dr.  Rosemary  Elliott,  Roswell  Park 
Cancer  Institute.  The  660  bp  Hae  III  fragment  of  pK823  (see  Figure  2)  was  used  as  a  probe 
against  the  backcross  panel.  The  probe  recognized  a  1.5  kb  allele  from  C57BL/6  and  a  800  bp 
allelefrom  Spretus.  This  polymorphism  was  followed  in  94  backcross  progeny.  The  top  of  the 
figure  represents  haplotype  analysis  of  the  middle  of  chromosome  7.  The  lower  panel  shows 
diagramtically  the  position  of  pK823  on  chromosome  7  with  distance  in  cM  on  the  left  side.  The 
lod  score  for  pK823  with  DTTrkl  was  28.1  showing  convincing  linkage  with  this  marker. 

Figure  5.  5’  RACE  of  pK823.  A.  The  kidney  Marathon  RACE  cDNA  from  Clontech  was 
screened  using  primer  K823-1  followed  by  a  nested  reaction  with  primer  K823-2  (shown  as 
arrows  with  1  and  2  labeled  above).  Arrows  above  show  direction  and  approximate  length  of 
sequence  analysis  of  pK823-2  along  with  the  sequence  analysis  of  pK823.  The  B1  element  is 
shown  as  a  striped  box  for  orientation.  The  partial  restriction  map  is  that  proposed  for  the 
pK823/pK823-2  combination.  Those  restriction  sites  in  parenthesis  are  part  of  the  either  the 
adaptor  primer  used  in  the  RACE  procedure  or  part  of  the  vector.  Again,  not  all  the  Hae  III  sites 
are  shown  for  clarity.  B.  The  sequence  of  the  overlap  region  of  the  two  clones  with  pK823  as  the 
top  strand  and  pK823-2  as  the  bottom  strand.  Note  that  the  two  are  nealy  100%  identicle.  C. 
Northern  strip  blot  with  a  1.2  kb  Eco  RI/  Hinc  II  fragment  from  pK823-2  as  a  probe.  The  probe 
recognizes  the  6.6  knt  in  all  tissues  as  well  as  the  1.4  knt  in  the  tumor.  The  chekered  box 
represents  pK823-2  and  the  horizontally  striped  box  represents  pK823.  Primers  K823-3  and 
K823-4  are  shown  at  the  extreme  5’  end  of  the  clone  and  were  used  to  extend  the  RACE. 

Figure  6.  Isolation  of  pK823-4.  Another  round  of  5’  RACE  using  the  kidney  Marathon  RACE  kit 
from  Clontech  was  used  with  primer  K823-3  followed  by  a  nested  PCR  with  primer  K823-4 
(arrows  3  and  4  at  the  end  of  pK823-2,  see  also  Figure  5).  Again  the  striped  box  represents  the 
B1  element  for  orientation.  All  three  clones  together  are  appoximately  4.6  kb.  B.  The  region  of 
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pK823-2  and  pK823-4  which  overlap.  The  117  bp  of  overlap  between  the  two  clones  is  nearly 
100%.  C.  Northern  stip  blot  using  the  entire  1.8  kb  pK823-4  clone  as  a  probe.  This  clone 
hybridizes  to  two  bands  in  the  kidney,  1.8  knt  and  1.9  knt,  and  only  to  the  1.9  knt  transcript  in 
other  tissues.  All  sizes  are  relative  to  the  18S  and  28S  ribosomal  bands  which  are  marked.  Lanes 
which  are  marked  with  an  A+  are  poly  (A)^  RNA  and  lanes  without  the  A+  are  total  RNA  from  that 
particular  tissue.  Dim  3  is  a  hormonally  induced  manunary  tumor  and  C3H  is  a  MMTV  induced 
mammary  tumor  from  a  BALB/c  mouse  foster  nursed  onto  a  fC3H  mouse  to  introduce  the  mouse 
mammaiy  tumor  virus. 

Figure  7.  Sequence  analysis  of  pB311  and  pLll-6.  A,  The  mouse  brain  cDNA  isolate  pB311 
was  used  as  a  probe  against  Northern  strip  blots  to  determine  expression  patterens.  The  800  bp 
Eco  RI  fragment  hybridzed  to  6.6  knt  in  brain,  kidney  and  the  TMIO  mouse  mammary  cell  line, 
and  to  the  1.4  knt  transctript  in  the  TMIO  cell  line  as  well  as  the  D2  tumor.  Weak  hybridization  to 
the  1.4  knt  transcript  could  be  detected  in  normal  tissues  with  this  probe.  The  pB3 1 1  probe  did  not 
hybridize  to  the  6.6  knt  transcript  in  the  D2  tumor.  The  lower  hybridizing  bands  most  likely 
represent  cross  hybridization  to  the  B1  repetitive  element  portion  of  the  probe.  B.  Sequence 
analysis  of  pB311.  The  B1  repetitive  element  homology  region  is  underlined.  C.  Sequence 
analysis  of  pLl  1-6.  The  B1  repetitive  element  homology  region  is  underlined.  Neither  of  these 
two  clones  showed  homology  to  known  genes  in  Genbank. 

Figure  8.  Sequence  comparison  of  kokopelli  with  the  D2  tumor  clone.  A.  The  3’  RACE 
procedure  was  done  on  poly  (A)^  RNA  from  a  D2  tumor  and  normal  mammaiy  gland  from  a  virgin 
mouse  using  a  unique  oligo  dT  adaptor  primer  and  GSP-4  followed  by  GSP-5R  (see  Methods). 
The  D2  and  normal  mammaiy  gland  clones  are  shown  as  one  sequence  (D2/MG  3 ’RACE)  as  they 
were  100%  identicle.  Note  at  position  24  the  presence  of  an  extra  C  residue  in  the  kokopelli 
transcript  which  is  not  prsent  in  the  two  other  isolates.  This  extra  C  residue  in  kokopelli  alters  the 
open  reading  frame  causing  a  stop  codon  downstream.  Other  changes  are  apparent  between  the 
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two  clones,  position  147  a  C  to  T;  an  extra  A  residue  in  kokopelli  at  position  199;  a  TT  to  AA  at 
position  377,  378.  The  nucleotide  changes  are  shown  in  bold  face  type.  B.  The  protein  translation 
of  the  D2  and  mammary  gland  transcripts  using  the  single  amino  acid  code. 
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Figure  1. 
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Figure  3. 


A. 


1  CCAATAGAGG  GGATTTTTTT  TTCTAATGTT  GAAAAAGCTA  CAAGCTACCA 

51  GGCAGTGGTG  GCACATTTCT  TTAGTCCCAG  CACTTGGGAG  GCAGAGGCAG 

101  GTGGATTTCT  GAGTTCTAGG  CCAGCCTGGT  CTACAGAGTA  AGTTCCAGGA 

151  CAGCCAGGGC  TACACAGAGA  AACCCTGTCT  TGAGGGGGTG  GGGGGGGTGC 

201  AGAAAGGAAG  GTAGGAAAGA  AGGAAGGATC  GAACTAAGAG  CCAAAATACC 

251  ACTCAATAAA  ACTCCGGGAG  ATCATACAGC  GTGTGGGAGC  CAAAGGAAAA 

301  AAAATGAACA  GTTTGGTTTT  TAAAAATATT  CTGTGGATGT  AAAAAGGCAG 

351  TGAGCTTAGA  AACGATTTAT  GAAGTCCCAT  TTATTGAGTG  GTTCCCTTAA 

401  CTAGTCATCT  CCCTGAGAAA  ACAATGACCA  TTCAAGGGAC  ATTAGTATCT 

451  TGAAGGTGTG  TAGACAAATC  TCCACCTGGG  GATATCTCTT  AACTCTCACT 

501  ATTGCAGAAG  GTGTTGAGGA  GGCAGGCCTC  GGGTGGGGGT  TGGGAAGCAG 

551  GCACTGTGTG  GACGGGCCCT  CCCCAGCAGG  TTGATTATTA  TTTTGGTGAA 

601  GGTTTTTATT  TTTCTTTACC  TTCTGTAGGC  TCCACAATGC  CTGGATTTTA 

651  TTGATATTGG  TATTCTGGGC  TCC 
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pK823 

133 

•  •  «  •  • 

ACAGAGTAAGTTCCAGGACAGCCAGGGCTACACAGAGAAACCCTGTCTTG 

1  1  1  1  1  M  i  !  1  1  1  1  1  1  1  1  1  i  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  i  1  1  M  1 

182 

Murine 

B1 

element 

335 

1  1  1  1  1  M  1  1  1  1  1  M  1  M  1  1  1  1  1  1  M  1  1  1  1  1  1  1  1  1  1  1  1  1  M  1  1  1  1  1  1  1 

ACAGAGTGAGTTCCAGAACAGCCAGGGCTACACAGAGAAACCCTGTCTCG 

384 

pK823 

183 

A 

1 

Murine 

B1 

element 

385 

1 

A 

22 


Figure  4 


Snrpn 
D7Xrf21 1 
K823-600 
D7Trk1 


7 


1.1 

2.1 

1.1 


Snrpn 

D7Xrf2n 

K823-600 

D7Trk1 


23 


Figure  5. 
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C. 


K 8  2  3  CCAATAGAGGGGATTTTTTTTTCTAATGTTGAAAAAGCTACAAGCTACC A 

llllllllllllllllllllllllllllllllllllllllllllllllil 

K82 3-2  CCAATAGAGGGGATTTTTTTTTCTAATGTTGAAAAAGCTACAAGCTACCA 
•  #  •  •  # 

K 8  2  3  GGCAGTGGTGGC AC ATTTCTTTAGTCCCAGC ACTTGGGAGGCAGAGGCAG 

llllllllllllllllllllllllllllllllllllllllllllllllll 

K8  2  3-2  GGCAGTGGTGGCACATTTCTTTAGTCCCAGCACTTGGGAGGCAGAGGCAG 
•  •  •  •  • 

K8  2  3  GTGGATTTCTGAGTTCTAGGCCAGCCTGGTCTACAGAGTAAGTTCCAGGA 

llllllillllllllllilllllllllllllllillllllllllllllll 

K823-2  GTGGATTTCTGAGTTCTAGGCCAGCCTGGTCTACAGAGTAAGTTCCAGGA 
•  •  •  •  • 

K823  CAGCCAGGGCTACACAGAGAAACCCTGTCTTGAGGGGGTGGGGGGGGTGC 

IIIIIIIIIIIIIIIIIIIIIIIMIIIIIIIIIIIIIIIIIIIIIIIM 

K823-2  CAGCCAGGGCTACACAGAGAAACCCTGTCTTGAGGGGGTGGGGGGGGTGC 
•  •  •  •  • 

K823  A. .GAAAGGAAGGTAGGAAAGAAGGAAGGATCGAACTAAGAGCCAAAATA 

I  lllllllllllllllllllllllllllllllllllllllllllllll 

K8 2 3-2  AGGGAAAGGAAGGTAGGAAAGAAGGAAGGATCGAACTAAGAGCCAAAATA 
K823  CCACTCAATAAAA 

lllllllllllll 

K823-2  CCACTCAATAAAA 


Figure  6. 


B. 

K82  3-2  GCTCTTAGCAACATTTTCCATAGTATTTTCTCAAAATGGTGCGTTAGATA 

lillllllllllllllllillllllillllllllllllMllllllllli 

K82  3-4  GCTCTTAGCAACATTTTCCATAGTATTTTCTCAAAATGGTGCGTTAGATA 
«  •  •  *  • 
K823-2  TAATCTAGGGGTTCCTCAGTATAAGCGCAAAACACACAGTAAGATACTAG 

lllilli  IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 

K8 2  3 - 4  TAATCTA . GGGTTCCTCAGTATAAGCGCAAAAC ACACAGTAAGATACTAG 
K823-2  CCAATGGTACATGTAGC 

llllllllllllilll: 

K823-4  CCAATGGTACATGTAGN 
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TGTGTGGGGC 
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AGATGCTGTC 

AGCTTGGCTC 
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GGAAAACCCA 

ATTTTCAGGA 
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ATAAGGAAGA 

GAGTGGCAGA 

GTGGAACACC 

AGATGGATGA 

ACGTCCTCTG 
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TTGGCCTCTG 

CTTGCGCTCA 

CAGGTGTGTG 

CATATTCACA 

CACATGCATA 
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CATCACACAC 

CAAAAGGAAA 

AAGGGAAGCA 

GAACGTCCCA 

TCTATCTCTC 
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ATGTTACTTA 
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CAAGGGTCGT 

551 
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CGGATTTCTG 
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AGTTCGAGGC 
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TACAGAGTGA 
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ATACAGAGAA 

ACCCTGCCGG 

AATTCTTTTG 
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TGGAAGAAAT 
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ACTCATAAGC 

CACCTCTGTT 

A 
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Figure  7.  continued 


C. 


1 

GAATTCGGCT 
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CATGGAGACT 

51 

GTTCAGCATT 

TCTTTGTCTA 

GAAGATAAGG 

GTAGGGATAA 

TGAGATTTTT 
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CTTGTGGAAT 

TAAGGAATGT 

ATTTTGCATA 

ACTCTTGTGA 

GCAGATGTTA 

151 

TATTAGTGTC 

TCTTGCTTCT 

CTGTCAGAGC 

CTGTGCAGCT 

GGCCTTGAAC 

201 

TTCAGGGGCT 

TTGTGATGCT 

TAGCCTACTC 

TATGTTGAAT 

ATGGTGCTGA 

251 

ACTGTGAAGT 

TTATTAGTTC 

TTACAATGTA 

AAAGAAACCC 

AGCTTGATGG 

301 

GCTGGAGAGA 

TGGCTCAGTA 

GTTAAGGGCA 

CTGACTGTCC 

TTCCAGAGGT 

351 

CCTGAGTTCA 

ATTCCCAGCA 

CCCACATGGT 

GGCTCACCAT 

GGGATCCCAT 

401 

GCCCTCTGAA 

GATAGCTACA 

ACATACTCAT 

ATAAATAATT 

TTTTTTTTAA 

451 

GAAAAAAAAC 

CCAGCTTGGG 

TACTATGTAA 

CAATATGCAC 

GCCTTTAGTC 

501 

CCAGCACTTG 

GGAAGCAGGT 

GGATCTTTTG 

GTTTAAGGCC 

AACCTGGTCT 

551 

ACAGGAGAGT 

TTTAGGATAG 

CCAGGGCTGC 

ACAGAGAGAT 

CCTGTCTCAA 

601 

GAAAAAAAGA 

CATTGCTATC 

CCTTTATCAT 

AGGGTTTTGG 

TTTTTTTTTT 
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ATTTTTCTGG 
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TCTGTGTAGC 
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CTGGGATTAA 
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AGGCGTGCAC 
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CGGCTTTTAT 

CATAGTTTTT 

AGTTGAAATT 
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ATTTTTTGGT 
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CAGCTGGTTT 

GTTGTTTGTG 

TCCCTATGTG 
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GACCCACACT 

GGCCTCAGAC 
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GTCTCTCTCT 

CCCCAGTGCA 
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TAATGGCAAA 

TATTAAACTT 

AACGTTTTTT 

CAAAAAAAAA 

AAACCGGAAT 
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TC 
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Figure  8 
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CATGTGTGTTGTCACCGTCCTCCCCCATTCCCCCCAAAAAAACCCAAACA 
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CATGTGTGTTGTCACCGTCCT . CCCCATTCCCCCCAAAAAAACCCAAACA 
AACAGAGTTTCCCTGTGTATAGCCCTGGCTGTCCTGGAACTTACTCTGTA 

IlillllllllllllllllllllllllllllllllllllllMIIIMM 

AACAGAGTTTCCCTGTGTATAGCCCTGGCTGTCCTGGAACTTACTCTGTA 
•  «  •  •  • 
GACAAGGCTGGCCCAGAATGCAAAGATTTTGTGTTGAGTATTGGGATTAA 

llllllllllllllllllllllllllllllllllllllllllllll  III 

GACAAGGCTGGCCCAGAATGCAAAGATTTTGTGTTGAGTATTGGGACTAA 
•  •  •  •  « 

AGGACCCCCCCCCCTTTTTAGGTTTAATGGACGTCCCGTCTGCTGTCAAC 

lllllllllllllllllllllllllllllllllllllllllllllll  II 

AGGACCCCCCCCCCTTTTTAGGTTTAATGGACGTCCCGTCTGCTGTC . AC 
•  •  •  •  « 

CTGAATGTTGTTTCTCACTCCTTTCCCTTTGTTCTGTTCTGCAGCTCCTG 

llllllllllllllllllllllllllllllllllllllllllllllllll 

CTGAATGTTGTTTCTCACTCCTTTCCCTTTGTTCTGTTCTGCAGCTCCTG 
«  »  •  •  • 

GGATGAGACCACCCATGGGAGGCCACATGCCCATGATGCCCGGACCTCCC 

llllllllllllllllllllllllllllllllllllllllllllllllll 

GGATGAGACCACCCATGGGAGGCCACATGCCCATGATGCCCGGACCTCCC 
•  «  •  •  • 

ATGATGAGACCTCCTGCCCGCCCTATGATGGTGCCCACCCGGCCTGGCAT 

lllllllllllillllllllllllllllllllllllllllllllllllll 

ATGATGAGACCTCCTGCCCGCCCTATGATGGTGCCCACCCGGCCTGGCAT 
«  •  *  •  • 

GACCCGGCCAGACAGATAAGAGCAGTTGCGCTCTTGATGGTTTTGTATTT 

lllllllllllllllllllllllll  lllllllllllllllllllllll 

GACCCGGCCAGACAGATAAGAGCAGAAGCGCTCTTGATGGTTTTGTATTT 
•  •  •  «  • 

CTTGTTCTGTTCCACCAGGAGCTCTTGGTGCTGAGCCCGAGTGTTTACTA 

CTTGTTCTGTTCCACCAGGAGCTCTTGGTGCTGAGCCCGAGTGTTTACTA 

m  m  •  •  m 

GATGCATGGAAAGGAAACTTCCCTTCCTAACTGAATATTTTTGGAGGGAG 

lllllillllllllllllllllllllllllllllllllllllllllllll 

GATGCATGGAAAGGAAACTTCCCTTCCTAACTGAATATTTTTGGAGGGAG 
•  •  •  *  • 

AAATAATACAAAAAAGTGCAGTTTTCACTTATATTGTGAAATGTGAAAAT 

llllllllllllllllllllllllllllllllllllllllllllllllll 

AAATAATACAAAAAAGTGCAGTTTTCACTTATATTGTGAAATGTGAAAAT 
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Figure  8  B 


CATGTGTGTTGTCACCGTCCTCCCCATTCCCCCCAAAAAiVACCCAAACAAACAGAGTTTC 

1 - + - + - + - + - + - +  60 

GTACACACAACAGTGGCAGGAGGGGTAAGGGGGGTTTTTTTGGGTTTGTTTGTCTCAAAG 

HVCCHRPPHSPQKNPNKQSF 

CCTGTGTATAGCCCTGGCTGTCCTGGAACTTACTCTGTAGACAAGGCTGGCCCAGAATGC 

61 - + - + - + - + - + - +  120 

GGACACATATCGGGACCGACAGGACCTTGAATGAGACATCTGTTCCGACCGGGTCTTACG 

PVYSPGCPGTYSVDKAGPEC 

AAAGATTTTGTGTTGAGTATTGGGACTAAAGGACCCCCCCCCCTTTTTAGGTTTAATGGA 

121 - + - + - + - + - - - + - +  180 

TTTCTAAAACACAACTCATAACCCTGATTTCCTGGGGGGGGGGAAAAATCCAAATTACCT 

KDFVLSIGTKGPPPLFRFNG 

CGTCCCGTCTGCTGTCACCTGAATGTTGTTTCTCACTCCTTTCCCTTTGTTCTGTTCTGC 

181 - + - + - + - + - + - +  240 

GCAGGGCAGACGACAGTGGACTTACAACAAAGAGTGAGGAAAGGGAAACAAGACAAGACG 

RPVCCHLNVVSHSFPFVLFC 

AGCTCCTGGGATGAGACCACCCATGGGAGGCCACATGCCCATGATGCCCGGACCTCCCAT 

241  - + - + - + - + - + - +  300 

TCGAGGACCCTACTCTGGTGGGTACCCTCCGGTGTACGGGTACTACGGGCCTGGAGGGTA 

SSWD  ETTHGRPHAHDARTSH 

GATGAGACCTCCTGCCCGCCCTATGATGGTGCCCACCCGGCCTGGCATGACCCGGCCAGA 

301  - + - + - + - + - + - +  360 

CTACTCTGGAGGACGGGCGGGATACTACCACGGGTGGGCCGGACCGTACTGGGCCGGTCT 

DETSCPPYDGAHPAWHDPAR 

CAGATAAGAGCAGAAGCGCTCTTGATGGTTTTGTATTTCTTGTTCTGTTCCACCAGGAGC 

361  - + - + - + - + - + - +  420 

GTCTATTCTCGTCTTCGCGAGAACTACCAAAACATAAAGAACAAGACAAGGTGGTCCTCG 

QIRAEALLMVLYFLFCSTRS 

TCTTGGTGCTGAGCCCGAGTGTTTACTAGATGCATGGAAAGGAAACTTCCCTTCCTAACT 

421 - + - + - + - - - +- - + - +  480 

AGAACCACGACTCGGGCTCACAAATGATCTACGTACCTTTCCTTTGAAGGGAAGGATTGA 

S  W  C  * 

GAATATTTTTGGAGGGAGAAATAATACAAAAAAGTGCAGTTTTCACTTATATTGTGAAAT 

481  - + - + - + - - - + - + - +  540 

CTTATAAAAACCTCCCTCTTTATTATGTTTTTTCACGTCAAAAGTGAATATAACACTTTA 


GTGAAAATAAAGTCATCAGCTCTTTTAGTTAAAAAAAAAAAAAAAJ^AAAAAA 

- - - + - + - +_ - + - 592 

CACTTTTATTTCAGTAGTCGAGAAAATCAAij^TTTTTTTTTTTTTTTTTTTT 
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